2025-10-19
sizlerle tanışmak!
Bir nevi konuşma temrini
kendime notlar ve yeni şeyler öğrenme alanı
The representation of numbers … should be directly proportional to the numerical quantities measured. — Edward Tufte 1983
Data visualization plays a crucial role in transforming complex data into accessible and interpretable formats, including charts, graphs, scatter plots, or other visualization types (Gubala and Meloncon, 2022).
“The representation and presentation of data to facilitate understanding”, framing comprehension as a dynamic process that involves perception, interpretation, and reasoning. (Kirk, 2016)
In how many ways of representing/communicating two quantities? Such as 75,37
Visualization leverages the human visual system to facilitate;
Our brain is capable of quick and efficient visual information processing at a pre-conscious level
For data visualization task, we can imagine the nested model of the data visualization design process
Best final visual depends on;
“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey
“Although we often hear that data speak for themselves, their voices can be soft and sly.” — Mosteller et al. (1983)
Source for Example: Storytelling with data a data visualization guide for business professionals Knaflic, Cole Nussbaumer, Wiley, 2015
“Data and information visualization is concerned with showing quantitative and qualitative information, so that a viewer can see patterns, trends or anomalies, constancy or variation, in ways that other forms—text and tables—do not allow” — Michael Friendly
gg in “ggplot2” stands for Grammar of Graphics from the book by Leland Wilkinsonggplot() is the main function in ggplot2
Construct plots by adding (+) layers – Not the %>% pipe!
Many types of geometries:
geom_points(), geom_histogram(), geom_line(), geom_boxplot(), etc.ggplot(data = [dataset], # Data
# Aesthetics
mapping = aes(x = [x-variable], y = [y-variable])) +
# Geometries
geom_[*]() +
# ...
other options/layers when you needed
Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
## Warning: package 'palmerpenguins' was built under R version 4.3.3
## Rows: 344 ## Columns: 8 ## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel… ## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse… ## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, … ## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, … ## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186… ## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, … ## $ sex <fct> male, female, female, NA, female, male, female, male… ## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
penguins <- palmerpenguins::penguins
ggplot(data = penguins,
mapping = aes(x = bill_depth_mm, y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Penguin bill depth & length",
x = "Bill depth (mm)",
y = "Bill length (mm)",
colour = "Species")
However, more is not necessarily good - there should be certain design choices
There are many types of data visualisation, each designed with a particular purpose.
To determine which visualisation style is appropriate, consider:
## Rows: 4,334 ## Columns: 6 ## $ species <chr> "Artibeus cinereus", "Artibeus cinereus", "Artibeus cinereus… ## $ body_mass <dbl> 12, 13, 11, 8, 13, 19, 12, 16, 11, 15, 18, 11, 11, 12, 14, 1… ## $ forearm <dbl> 41.0, 38.6, 39.1, 39.2, 39.0, 43.0, 40.0, 41.0, 41.0, 40.5, … ## $ age <chr> "adult", "adult", "adult", "adult", "adult", "adult", "juven… ## $ sex <chr> "male", "female", "male", "male", "male", "female", "male", … ## $ year <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, …
| Variable | Description | Type |
|---|---|---|
species |
Name of bat species | Categorical (5 cases) |
body_mass |
Body mass (in grams) | Numerical, continuous |
forearm |
Forearm length (in mm) | Numerical, continuous |
age |
Adult or Juvenile | Categorical (2 cases) |
sex |
Female or Male | Categorical (2 cases) |
year |
Year of measurement | Numerical, discrete |
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
As a 2x2 setting
ggplot2 is used to create a wide variety of data visualisation styles. It has many customisation tools to support your communication.
Use ggplot2 manual pages (and online searches) to support your creation.
Specific starting point would be R Graph gallery
But, excessive customisation can be distracting.
In general, there are various resources to explore
Leveraging Ensemble and Hybrid Forecasting Tools to Increase Accuracy: Turkey COVID-19 Case Study
Mathematics Students’ Adoption and Perceptions of Generative AI tools -Results from a Survey
Cluster-specific ranking and variable importance for Scottish regional deprivation via vine mixtures
UG project: Clustering Dementia Characteristics Using Vine Copula Mixture Models
Machine Learning Meets Hospitality: Explainable AI in Hotel Booking Cancellations
Shiny based web apps for specific topics
Targeting visual learners with interactivity, implemented in some courses such as Statistics or Data Science
One example if time allows! - LATER
| Package | Interactivity | Typical Tasks | Standout Strengths |
|---|---|---|---|
| plotly (R) | Interactive (HTML) | Dashboards, rich interactive plots | Supports many chart types—including 3D, contour, maps—and works offline |
| highcharter | Interactive (JS via R) | Themed charts, interactive dashboards | Leverages Highcharts JS with professional themes (“economist”, “FT”, “538”) |
| leaflet (R) | Interactive maps | Web maps with markers, layers | R binding for Leaflet.js; great for interactive maps in Shiny/R Markdown |
| dygraphs | Interactive (JS time series) | Time-series charts with brushing | Handles large time-series efficiently, integrates RColorBrewer palettes |
Best Practices for Data Visualisation aims to equip readers with the fundamentals for creating data visualizations that are high-quality, readable, impactful, and accurate in both presentation and interpretation.
The guide blends both the scientific and creative aspects of visualization — it emphasizes that data viz is not just plotting defaults, but actively telling a story and choosing design elements deliberately to serve that story
Though primarily aimed at RSS publications (Significance magazine, JRSS Series A, Real World Data Science), the guidance is broadly relevant for any data viz task
Thanks to Andreas Krause, Nicola Rennie, and Brian Tarran
To select the right chart type, consider:
Checklist:
Emphasis is placed on styling for accessibility, using thoughtful choices in color, annotation, fonts, and alternative text
Key styling components:
Where to start might not be easy, so good to explore already published good images
There are specific visual galleries for both R and Python with code and created images
In terms of understanding the visual type, when to apply, From Data to Viz
Projelerde iyi bir eşlikçi olabilir…
Bazı R ya da Python paketlerinde LLM based destek, PandasAI gibi
Code generation is being easier - Github Copilot
Specific package AI helper, such as Shiny Assistant
AI destekli DataViz daha belirgin olacak ancak, dizayn ve yorumlama açısından bize aktif rol düşüyor ve düşmeli!
Programlama dillerinde data viz hazırlarken, daha çok LLM desteği olabilir, diğer bir deyişle görseli konuşarak çizmeye başladık çoktan bu daha da yaygınlaşabilir
Shiny based vb platformların kullanıcıya sunacağı esneklikler ve kullanıcı sorgusu odaklı alternatifler anında oluşturulabilir
İşin dizayn tarafı ve yorumlanması yine çok önemli o yüzden bu tarz örneklerle bol bol ilgilenmek, farklı verilerin grafikleri neler söyler anlayabilmek önemli
Bol bol denemeler yapmakta fayda var- ne demişler no pain no gain yahut emeksiz yemek olmaz
Açık kaynak örneklerden bol bol yararlanıp, onları yeniden üretme oyunları oynamak iyi bir aktivite olabilir
Örnek alınacak belirli isimleri takip etmekte ve okuduğumuz yazıların görsellerini kritik bir gözle incelemekte fayda var, Cara Thompson
Belirli gruplar ile veri görseli odaklı yarışmalar vb düzenlenebilir, online etkinlikler (R consortium, Python webinars gibi) düzenli takip edilebilir
R ve pythonda olan paket ve örneklerin hangi durumlarda nasıl işe yarayacağı ara ara düzenli okuma ve takiplerle bir düzenli takip alışkanlığına getirilebilir
Kaynak: Herkese Bilim Teknoloji Gazetesi, Sayı 493
Kaynak: Herkese Bilim Teknoloji Gazetesi, Sayı 493
Kaynak: Oksijen Gazetes, Financial Times eki
Bir sincap gibi gezdim durdum
Şimdilik huzuru Edinburgh da buldum
Cagrimdir meraklisina R ile veriyi cizedursun
Yolu beraber yürümek isteyenler isterse beni bulsun :)
Bu sunum ve içindeki görseller için: https://github.com/oevkaya/18StatColloqTalk